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This paper addresses the question of how the brain nnaintains a probabilistic body state 
estinnate over tinne fronn a modeling perspective. The neural Modular Modality Frame 
(nMMF) model simulates such a body state estimation process by continuously integrating 
redundant, multimodal body state information sources. The body state estimate itself is 
distributed over separate, but bidirectionally interacting modules. nMMF compares the 
incoming sensory and present body state information across the interacting modules 
and fuses the information sources accordingly. At the same time, nMMF enforces body 
state estimation consistency across the modules. nMMF is able to detect conflicting 
sensory information and to consequently decrease the influence of implausible sensor 
sources on the fly. In contrast to the previously published Modular Modality Frame 
(MMF) model, nMMF offers a biologically plausible neural implementation based on 
distributed, probabilistic population codes. Besides its neural plausibility, the neural 
encoding has the advantage of enabling (a) additional probabilistic information flow across 
the separate body state estimation modules and (b) the representation of arbitrary 
probability distributions of a body state. The results show that the neural estimates can 
detect and decrease the impact of false sensory information, can propagate conflicting 
information across modules, and can improve overall estimation accuracy due to additional 
module interactions. Even bodily illusions, such as the rubber hand illusion, can be 
simulated with nMMF We conclude with an outlook on the potential of modeling human 
data and of invoking goal-directed behavioral control. 
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1. INTRODUCTION 

Humans and other animals appear to learn and maintain a body 
schema^ (Graziano and Botvinick, 1999; Haggard and Wolpert, 
2005), which is used to realize goal-directed behavior control. 
Evidence for having knowledge about the own body schema and 
associated body image is already found in 2 -month old chil- 
dren, indicating that this knowledge is acquired very early in 
life (von Hofsten, 2004; Rochat, 2010). The more accurate the 
own body schema is, the more the infant is able to separate 
the external world (von Hoist and Mittelstaedt, 1950) from its 
own body and, consequently, the more the infant is able to 
actively and goal-directedly explore the world (Konczak et al., 
1995; Butz and Pezzulo, 2008). Developmental as well as neu- 
roscientific evidence indicates that developing a body schema is 
critical for developing flexible, goal- directed behavioral control. 
In this paper we propose a computational neural model of how 
knowledge about the body can be represented, processed, and 
learned. 



Note that Table 1 lists the terminology utilized in this paper. 



When learning such a body schema, specific challenges must 
be met. First, sensory information about the body is available 
in different modalities and frames of reference. Thus, mappings 
between these modalities need to be established. Second, uncer- 
tainty due to noise, external forces, and changes of the body and 
the environment has to be handled effectively. Third, different 
information signals about the body may contradict each other, 
so that the maintenance of the present body state estimate is 
non-trivial. 

The human brain has solved these challenges. In particular, the 
brain appears to be able to flexibly integrate multimodal sensory 
information about the body into a current estimate of its body 
state. This body state estimate seems to be modularized in two 
fashions: sensory modality- respective modularizations and body 
part- respective modularizations. 

Evidence for sensor- specific modularizations can be found 
in brain imaging studies, which suggest that cross-modal sen- 
sory information fusion is common when perceiving the own 
body (Shams et al., 2000; Shimojo and Shams, 2001; Beauchamp, 
2005). Related research suggests that body state representations 
are separated into body parts to certain degrees (Andersen et al. 
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Table 1 | MMF-terminology. 



Body iiTiagG 


A usually conscious representation of the way the body appears from the outside (Haggard and Wolpert, 2005) 


Body inodGl 


Static knowledge about the body! segmentation into body parts, metrics, and mappings between modules 


Body schenna 


A group of body representations relevant for action (Haggard and Wolpert, 2005). This includes body state, body space. 




and body model. It allows for updates of the body state. The term body schema has been used across disciplines and 




with varying degrees of precision (Sekiyama et al., 2000; Buneo et al., 2002; Battaglia-Mayer et al., 2003; Maravita 




et al., 2003; Makin et al., 2008; Burns and Blohm, 2010; Hoffmann et al., 2010; Sober and Kording, 2012) 


Body space 


Teachable space of a particular body part in a particular modality 


Body state 


A j_" J. X J.I j_ 1 1 £' j_" ^ /I X j_ J.I 1 1 j_ j_ II" "1 II 1 

An estimate of the current body configuration. May refer to the body state encoded in a single module or spread over 




multiple modules 


Distal-to-proxinnal 


^y| I'j." £' j_" "j. II 1 II 

Mapping direction: fingertip wrist elbow shoulder 


Forward 


Mapping direction: joint angles local orientation global orientation location 


Franne of reference 


The coordinate system of a module: "global" (shoulder centered) or local (respective the next proximal body part) 


Information fusion 


Bayes optimal fusion of multiple probability distributions. These may include multiple sensors, multiple body states in 




different modules, or both 


Inverse 


Mapping direction: location global orientation local orientation joint angles 


Mappings 


The set of connections between neurons in one or two input modules and neurons in one output module. There are 




three "types" of mappings: forward kinematics, inverse kinematics, and distal-to-proximal kinematics. They are used to 




propagate neuronal activity to other modules 


Modality 


Which information is encoded in which frame of reference: nMMF uses position-vectors, orientation-vectors in a 




"global" (i.e., respective the shoulder) or "local" (i.e., respective the next proximal body part) frame of reference, or 




joint-angles 


Module 


A state space of the body, such as the wrist location in space. Modules may differ with respect to modalities, frames of 




reference, and body parts 


Neural population 


A set of neurons that encode the spatial distribution in a particular module. The population as a whole encodes a 




probability distribution 


nMMF 


1 ^ yi II ^ yi i I'j. i~ i i j-i i i j_ i ■ j_i ■ i 

neural Modular Modality Frame model: the model presented in this work 


Proxinnal-to-distal 


Mapping direction: shoulder elbow wrist fingertip (cf. Figure 4) 


a; 


Probability mass of the l-th neuron in module i's population. The probability mass is the same as the Voronoi volume Vj 




(cf. Appendix A. 2) times neuron I's probability density, normalized to 1 


Sensor integration 


The special case where sensory information is fused with the body state. Also, the result becomes the new body state 


Transfornnation step 


Projects input information from one or two modules to a neighboring module 



1997; Gentner and Classen, 2006; Latash et al., 2007; Shadmehr 
and Krakauer, 2008; de Vignemont et al, 2009). Thus, a highly 
modularized body state estimate is maintained by our brain. 

For maintaining such a modularized but consistent body 
state estimate, information is effectively interchanged and fused 
across the modularizations (Tononi et al, 1998; Ernst and 
Bulthoff, 2004; Stein and Stanford, 2008). Hereby, the informa- 
tion exchange typically depends on how the body is currently 
positioned and oriented in space (Holmes and Spence, 2004; Butz 
et al, 2010). Neurological disorders further indicate that both 
sensory input and body state estimates are fused across modules 
(Giummarra et al., 2008). To combine incoming sensory infor- 
mation with the most accurate body state estimate, the brain 



also anticipates body state changes and consequent sensory feed- 
back during movement execution (von Hoist and Mittelstaedt, 
1950; Blakemore et al, 2000; Sommer and Wurtz, 2006). Many 
of these interactions seem to take place in early stages of the 
cortical processing hierarchy (Stein and Stanford, 2008), prob- 
ably before the sensory information is fully integrated into the 
own body state estimate. Further evidence for sensory informa- 
tion comparisons and the flexible fusion of this information for 
maintaining body state estimates is given by multimodal illusions 
like the rubber hand illusion (Botvinick et al, 1998; Haggard 
and Wolpert, 2005; Makin et al, 2008) and the Pinocchio illu- 
sion (Lackner, 1988). Thus, it appears that while the brain s body 
state estimate is highly modularized, many interactions ensure 



Frontiers in Computational Neuroscience 



www.frontiersin.org 



October 2013 | Volume 7 | Article 148 | 2 



Ehrenfeld et al. 



Modular neuron-based body estinnation 



an effective estimate maintenance and sensory information inte- 
gration. However, it remains unclear how, when, and which 
information is compared and selectively fused. 

We recently proposed the Modular Modality Frame (MMF) 
model (Ehrenfeld and Butz, 2011, 2012, 2013), which models the 
maintenance of a body state estimate given noisy, multimodal 
sensory information sources. The MMF model fully relies on 
hard- coded kinematic knowledge of the simulated body and esti- 
mates body states by means of Gaussian probability densities. 
Here we present a neural extension of MMF — the neural Modular 
Modality Frame (nMMF) model. The novel contributions of 
nMMF are as follows: 

First, body spaces, current body state estimation modules, 
and mappings between body modules are now implemented 
neurally. As a result, nMMF is able to encode arbitrary, even mul- 
timodal body state estimations. Moreover, the neural population 
encodings for body state estimates are plausible from a com- 
putational neuroscience perspective (Deneve and Pouget, 2004; 
Knill and Pouget, 2004; Deneve et al, 2007; Doya et al, 2007). 
Second, we now ensure that the Shannon entropy of a distri- 
bution remains unchanged during multi-body state fusion, in 
order to avoid excessive information gain when fusing depen- 
dent sources of information. Third, information exchange is 
no longer restricted to forward and inverse kinematic map- 
pings. Distal-to-proximal mappings are also included. This means 
that information about the hand in space can, for exam- 
ple, influence the estimate of the elbow location, of the ori- 
entation of the upper arm, or even of the shoulder joint 
angles. 

The remainder of this paper is structured as follows. First, the 
nMMF model is detailed. Next, nMMF is evaluated on a sim- 
ulated two degree of freedom arm in a two-dimensional setup. 
The evaluations show that nMMF is able to detect faulty sensory 
information on the fly and is able to propagate information 
appropriately distal-to-proximal, i.e., from hand to upper arm. 
In the final discussion, we compare nMMF to related models and 
sketch-out future research directions. 

2. MATERIALS AND METHODS 

nMMF is inspired by those processes of human body state esti- 
mation which are detailed above. In a computational framework, 
these processes can be approximated by five key assumptions: 
( 1 ) the body state is continuously estimated probabilistically over 
time; (2) multimodal, redundant sensory information sources 
are integrated based on Bayesian principles; (3) the body state 
representation is modularized along body parts as well as along 
modalities and their corresponding frames-of-reference; (4) the 
body modules are locally interactive in that information about the 
body state is compared and fused locally; (5) the redundant, mod- 
ularized representation of the body is exploited for autonomous 
sensor failure detection and subsequent avoidance of the failing 
sensor's influence. 

We now detail how these key aspects are realized in nMMF. 
First, we describe which modules are used, second, how neurons 
encode the sensory inputs and the body state, third, how informa- 
tion is fused, fourth, how information is projected across mod- 
ules, fifth, how conflicting information is detected and blocked 



out, and, finally, how the overall information flow unfolds 
over time. In the subsequent evaluation section we show how 
nMMF processes sensory information, how faulty sensory infor- 
mation can be ignored to a certain degree, but also how such 
faulty sensory information can influence the complete body state 
estimation. 

2.1. MODULES 

nMMF represents a body state by a collection of modules, where 
each module represents an aspect of the overall body state. In par- 
ticular, nMMF's modules differ with respect to (1) the encoded 
joint (or the next distal limb) and (2) the modality frame in 
which the joint or limb is encoded. The term modality frame 
defines which modality is perceived (location, orientation, or 
joint angle) and in which frame of reference the modality is 
encoded (shoulder- centered or "local" with respect to the next 
proximal limb). 

In the following, we focus on a general description of a 
humanoid arm, although the same principle may apply for a com- 
plete body description. First, we specif)^ the state of an arm in 
general. Next, we detail how nMMF encodes the arm state in its 
respective modules. 

2. 1. 1. Arm specification 

An arm state may be encoded by the arm's location in space, 
its limb orientations, or the joint angles. With respect to the 
arm's location, we denote the shoulder (elbow, wrist, fingertips) 
location by "ko (^i? ^2? ^3) (cf. Figure 1 for an illustration). To 



Global > k 




FIGURE 1 I Schematic of the four "hand"-limb-encoding modules. 

Three coordinate systems (solid axes) are shown, together with the 
connponents (dashed lines) of the respective encoded vector. Dark gray 
(Global Location module): the coordinate system is centered around the 
shoulder with fixed orientation. Encoded is the global location vector, which 
goes from shoulder to the end-effector. Yellow (Global Orientation module): 
the coordinate system has the same orientation as the gray one but in this 
case the limb orientation is encoded by the means of two vectors: a unit 
vector parallel to the "hand" limb (shown, dashed lines), and a 
perpendicular vector (not shown). Red (Local Orientation module): the local 
coordinate system is oriented along the forearm. Relative to this forearm 
orientation, the orientation of the "hand" limb is encoded — by a unit vector 
parallel to the "hand" limb (shown), and a perpendicular vector (not 
shown). Green (Local Angle module): the fourth module encodes angles. 
The same four modules and respective coordinate systems exist for the 
forearm and the upper arm (not shown). Modified based on Ehrenfeld and 
Butz (2012, 2013). 



Frontiers in Computational Neuroscience 



www.frontiersin.org 



October 2013 | Volume 7 | Article 148 | 3 



Ehrenfeld et al. 



Modular neuron-based body estinnation 



derive the arm limb orientations we simply subtract successive 
limb locations. To additionally encode the inner rotations of 
the respective limbs, we define a point Ki for each limb i, 
where Kj is locked relative to the limb. Essentially, Kj always lies 
somewhere on the unit circle around "kf, where the unit cir- 
cle's plane is perpendicular to the orientation of limb i. Finally, 
the joint angles of each arm joint i are denoted by the Tait- 
Bryan angles (cj)/,!, (l)i,2» ^7,3)5 which rotate about the intrinsic 
rotation axes ^~^x, ^~^y' , ^~^z" , where one (two) apostrophes 
denote that the rotation axis has been rotated by the angles 
(and (t)i,2). 

2.1.2. nMMF's arm encoding 

nMMF encodes probabilistic arm states by means of dis- 
tributed population codes in redundant modules. In partic- 
ular, each limb is encoded in four modality frames: global 
location (GL), global orientation (GO), local orientation (LO), 
and local (joint) angles (LA). Note that other modalities could 
be used in addition and other combinations of modalities 
and frames of reference are possible — such as a local loca- 
tion. It is crucial, however, that the chosen combinations 
form a redundant estimate of the overall body state. nMMF's 
implemented modules and their interactions are shown in 
Figure 4; Figure 1 shows the employed modality frames for an 
exemplar arm. 

To encode each modality frame, respective coordinate systems 
need to be defined. In order to provide a consistent notation for 
all nMMF modules, we introduce x^' as the estimated arm state 
of limb i in modality frame Z, where Z e {GL, GO, LO, LA}^. 

The first modality frame encodes the global location (GL) of 
an arm limb. Limb i's end point Xi in the GL modality frame is 
the 3D vector from the shoulder to the end-point of limb i: 



^0- 



(1) 



The global orientation (GO) is a 6D vector. It concatenates both 
a 3D unit-vector in the direction of the arm limb, and a 3D 
unit-vector perpendicular to the arm limb dependent on its inner 
rotation: 



GO, _ { unit (ki - X 
unit (Kj — X 



i-i)J 



(2) 



As both vectors are unit vectors and are perpendicular to each 
other, three degrees of freedom are canceled out and all remaining 
orientation vectors form a 3D manifold in 6D space. 

The local orientation (LO) is analogous, but expresses both 
subvectors in a local coordinate system (e.g., LO2 is expressed in 



^Without any additional specification, arm states are encoded in a 
"global" i.e., shoulder- respective coordinate system. In the case of the 
local orientation (LO) modality frame, however, the coordinate sys- 
tem used to encode the state is relative to the next proximal arm 
limb. We use the pre-superscript to denote the encoding of a loca- 
tion in a limb-relative coordinate system. For example, ^~^Xi denotes 
the location of limb i relative to the location Xj^i and orientation of 
limb i — 1 . 



a coordinate system whose axes are defined by GOi). Again, only 
a 3D manifold remains: 



/unit('-i\, -'-i>.,_i) 
" umt('-i|c,-'-'Vi) 



(3) 



Note that we use the pre-superscript to denote a particular, rela- 
tive coordinate system, whereas we use the subscript to denote a 
particular limb. Furthermore, note that = (0, 0, 0)^ due 

to the definition of the coordinate system relative to limb i — 1. 
Finally, the local angles (LA) are encoded as Tait-Bryan angles 



(4) 



which is identical to the arm encoding itself. 

Note that all modality frames are maximally 3D. Thus, the 
locality of the modular architecture ensures that the amount of 
neurons needed to represent a particular modality frame with 
a neural population code of n neurons per dimension scales in 
0{n'). 

2.2. PROBABILISTIC REPRESENTATION 

In complex tasks, uncertainty is ubiquitous due to sensory and 
motor noise, external forces, changes in the environment, and 
changes of the body schema. To deal with this uncertainty, 
humans apply probabilistic body state estimations (Ernst and 
Banks, 2002; Kording and Wolpert, 2004). In computational 
models (e.g.. Ma et al, 2006), state estimates are often simplified 
by confining probability density estimates to one type of distri- 
bution (such as the Gaussian, Gamma or Poisson distributions). 
However, shapes may vary greatly due to non-linear influences of 
mappings across modules, constraints (like joint restrictions or 
obstacles), varying shapes of sensory input to begin with, or even 
neural disorders. Moreover, in certain circumstances the brain 
may actually maintain multimodal alternatives about the current 
body state. 

In contrast to MMF, nMMF approximates probability distri- 
butions with neural population codes (Deneve et al., 1999) to 
enable the representation of probability distributions with arbi- 
trary shapes. Each neuron in such a code is responsive to specific 
values of the input data (preferred value) and thus has a local 
receptive field of a particular size. Note that by using population 
codes, the shapes of the encoded probability distributions become 
unconstrained. The modularity of nMMF ensures a scalable neu- 
ral encoding of the arm or even the full body. In the following, we 
describe how the receptive fields and the preferred values of the 
population neurons are determined. 

2.2. 1. Sampling of neural populations 

In order to create neurons only within the reachable manifolds, 
we let the populations of neurons grow while observing simulated 
arm states. This is done in the following way: A simulated arm is 
set to a random arm position, which is uniformly distributed in 
angular space. Then, noiseless measurements are obtained in 
each module;. If 



llz^'-xjll >^^inV/G{l,...,iV^'}, 



(5) 
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a new neuron is added at z^, where x| denotes the preferred value 
of neuron / and the current number of neurons that exist in 
module j. Next, the arm is set to a new random position. Thus, 
all sampling positions are independent of each other and the 
resulting neurons in each module are approximately uniformly 
distributed, covering the reachable manifold. 

2.2.2. Tuning function 

Each neuron has an associated tuning function (Deneve et al., 
1999), which specifies how the neuron responds to a signal. We 
use Gaussian tuning functions with mean x/ and covariance R. 
For instance, if a measurement signal occurs at position z, the 
probability density function (PDF) at x/ is: 



p/ = iV(z,R)(x/). 



(6) 



In effect, a Gaussian PDF is activated over the whole neural 
population (cf. Figure 2, yellow bars for an illustration). If the 
covariance R of all tuning functions is equal to the sensor covari- 
ance, then Equation (6) is the same as the inverse measurement 
model (Thrun et al, 2005). 

Since probability mass has to be conserved when informa- 
tion flows from one module to another in nMMF, we derive 
the probability mass function (PMF) from the PDF. Note that 
the neural PMF encoding will typically slightly differ from 
the PDF encoding in nMMF, because the population codes in 
nMMF may not be uniformly distributed. This is illustrated in 
Figure 2. 

2.2.3. Probability mass 

Let X be a multivariate random variable, and oo a subset of a 
sample space Q. The probability mass ^ in oo corresponds to the 



p(x|z) Sensor Measurement z 



-tuning function 
density 
-mass 




FIGURE 2 I Each neuron has a tuning function (Deneve et a!., 1999) 
that defines how the neuron responds to a signal. Generally, these 
tuning functions are considered to be bell-shaped, such as the shown 
Gaussian kernels. As a consequence of this encoding, the PDF encoded by 
the neural population beconnes Gaussian as well (yellow bars), while the 
probability nnass (blue) is sonnewhat distorted because it accounts for the 
local neural density. 



probability that X lies in oo: 



= Pr [X e od] = j p (x) dx 

J CO 



(7) 



Just as N neurons are spread over ^2, ^2 is discretized into N sub- 
sets 00/, /g (l..iV), which are simply the Voronoi cells Ri of those 
neurons (cf. Appendix A.2). The probability mass of a neuron 
can then be approximated by the Volume V of the cell times the 
density (Equation 6) at the neuron s position 



qi : 



Iri ^ 



p (x) dx '' 



V;-p(x,) 



E;i = i^/*-p(x/*) 



(8) 



where the denominator normalizes the probability mass to 1 . An 
illustration of a probability mass is shown in Figure 2, blue bars. 
To handle potential approximation errors, we ensure that the sum 
of the probability mass over all neurons AT in a module is always 
normalized to 1, by 



(9) 



where the symbol is used as a value update assignment. 
2.3. INFORMATION FUSION 

With a neural, modularized, probabilistic body state represen- 
tation in hand, we now focus on information processing and 
information exchange. In this section, we first detail the fusion 
of different neur ally- represented PDFs, and consecutively derive 
the fusion of different PMFs. Two cases are considered: that 
the information carried by the different PMFs is dependent or 
independent. 

The Bayesian fusion (Bloch, 1996) of multiple independent 
neur ally- encoded probability distributions is the neuron- wise 
product of the respective PDFs. Thus, the fusion yields: 



M 



Pfused,/ Y\ Pj,h 



(10) 



where M specifies the number of modality frames that are fused, 
/ is the index of a specific neuron, and pjj encodes the probability 
density that stems from modality frame ; and that is covered by 
neuron /. As the density can be converted to a mass by pi = qi ■ 



Vi , applying this identity to both sides of Equation (10) yields 
the fusion of PMFs 



(Vi) 



-(M- 1) 1-tM 



pfused,/ ■ 



-(M- 1) 1-tM 



nM 



(11) 



when Equations (10) or (1 1) is used to fuse partly or fully depen- 
dent information, the resulting distribution is overconfident (i.e., 
too narrow). 

To correct for this overconfidence, the PDF can be raised 
to the power of an exponent a < 1. However, since we 
encode PMFs, additional conversions are again neces- 
sary to account for the Voronoi volumes covered by the 
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respective neurons. The correction for overconfidence is thus 
accompHshed by: 



V 



^fused,/ 



(<2flised,/)^ 



^ (<2fused,/*)^ 



(12) 



where the denominator normaUzes the mass to 1. The effect 
is a widening of the encoded PMF, which is illustrated in 
Figure 3. 

To infer the exponent a, a measure of information content is 
required. We use the Shannon entropy h to estimate the amount 
of information in a PMF: 



/z = - ^ (2/ • In (qi) , 



(13) 



where qi may denote the fused distribution as in Equation (11) 
or any other arbitrary distribution. If all distributions were 
Gaussian, the exponent could be derived from Equation (12) by 
requiring that the Shannon entropy in a module before fusion 
should be equal to the Shannon entropy after fusion: 



(14) 



frame axis (forward and inverse, shown vertically in Figure 4). 
Information may flow from one or two input modules to a 
neighboring output module. This may happen diagonally: Out 
of the four diagonal directions, only three are single trans- 
formation steps: proximal-to-distal-forward, proximal-to-distal- 
inverse, and distal-to-proximal-forward. ^ Together, all three 
form a triangle in Figure 4 — e.g., (GL2, GL3, GO3). In robotics, 
proximal-to-distal-forward and proximal-to-distal-inverse are 
typically termed forward and inverse kinematics, respectively, 
while distal-to-proximal mappings are often ignored. 

2.4.1. Single transformation steps 

Rather than learning the neural connections, here we use hard- 
coded kinematic mappings 



(15) 



where i, k are neighboring modules of nMMF. A derivation of 
the closed form of P'^^^ can be found in Ehrenfeld and Butz 
(2013). 

For all pairs of input neurons m and n, connections are built 
to those neurons / in the output module, which are sufficiently 
close to the transformation result x^'^^^ (m, n). The Gaussian 



Due to the lack of a rigorous derivation of a in the general case, 
we utilize this approximation to determine a for our population- 
encoded probability masses in each module. 



^In contrast, the fourth diagonal direction, distal- to-proximal-inverse, is not 
a single transformation step: the fingertip location and the hand orientation 
simply do not influence the proximal arm's orientation directly. 



2.4. CROSS-MODULE CONNECTIONS 

With notations for modules in nMMF, neur ally- encoded prob- 
ability masses, and information fusion of redundant sources of 
information at hand, we now specify how the neural, cross- 
module connections are implemented in nMMF. 

Modules may differ along two axes: the limb-axis (proximal- 
to-distal, shown horizontally in Figure 4), and the modality 



0.4 



0.3 



0.2 



0.1 



0.0 




Xj Xg X3 X4 X5 Xq X7 Xs 



FIGURE 3 I The solid blue curve is modified by raising the PDF to the 
power of 2 neuron-wise, resulting in the dashed yellow curve. As the 

exponent is <1, the distribution is widened, i.e., infornnation is diffused. 
This effect is used in two cases: (1 ) to correct for overconfidence due to the 
connbination of dependent infornnation sources and (2) to reduce the 
influence of a nnodule that is in conflict with other nnodules. 



proximal 



limb axis 



distal , 



global 

location (GL) 



global 

orientation (GO)' 



local 

orientation (LO) 



local 

angles (LA) 



2 3 
wrist fingertips 




o 

shoulder 



FIGURE 4 I Transformation steps between different modules: The 
modules (shown as circles) differ with respect to limbs (horizontal 
axis) and with respect to modalities and frames of reference (vertical 
axis). Every transfornnation step consists of one or two input modules and 
one output module. An example is the two solid lines on the top right: 
together, they encode how the wrist location GL2 depends on both the 
fingertip location G/.3 and the global hand orientation GO3. Yellow 
dash-dotted lines are the forward kinematics, dark gray dotted lines the 
inverse kinematics, and red solid lines the distal-to-proximal kinematics. 
Modified based on Ehrenfeld and Butz (2012, 2013). 
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(Equation 30) value for the Euclidean distance of each neuron / 
in the output module i to the transformation result x^'^^^ (m, n) 
is used as connection strength w: 

= ^' ■ ^(^'■'"' «) ' i^Map) (^i)' (16) 

where the receptive field covariance R^ap regulates how much 
the mapping itself widens the encoded probability distribution. It 
models an information loss during a transformation, either due 
to inaccurate mappings or due to discretization errors. Since we 
use accurate mappings, we only need to consider the latter and 
therefore base R^ap the neuron distance in the output module. 

If the transformation step has two inputs from the location 
modality GL (e.g., an elbow location GLi and a wrist location 

GL2) the distance of both neurons' preferred values |x^^^ — x^^^ | 
must be approximately equal to the length of the forearm. We 
introduce a modifying factor F with respect to neurons m and n, 
which reflects how well the constraint is met: 

where t/iimb is the length of the respective arm limb, and 
Axot„ = x^^^ — x^^^ the relative position of both input neurons. 
Intuitively, (| Ax^^l — 4imb)^ results in a penalization of larger 
deviations from the limb length, and the first factors scale this 
penalization dependent on the covariance in the mapping. For 
all other transformation steps, no constraints are necessary, and 
= 1 in these cases. In consequence, the connection weights w 
are normalized by 

^^^^i< ■ , ■ Vm, n, (18) 

where the modifying factor Fmn blocks the influence of pairs of 
location neurons that do not correspond with the arm length 
sufficiently well. 

Finally, the projection of two probability distributions ^, 
along the connections /^'^^^ into module i yields 

Hl= ■■ ■ • , (19) 

I]/* 12m 4m ^In ^m,n^l* 

where the denominator normalizes the overall activity again to 1 . 
2.4.2. Chain of transformation steps 

As nMMF's modules are strongly interconnected, information 
flows from any module to all other modules. This requires that 
multiple information transformation steps be done successively. 

In nMMF, information is projected into other modules by 
means of two different approaches. The first approach is used 
when information needs to stay independent for determining 
plausibility estimates (cf. section 2.5). In this case, the forward 



or inverse kinematic mappings are used without fusing other 
information on the way. Thus, information is not mixed and 
projections of independent information sources into a com- 
mon module stay independent. For example, sensory input 
from a local angle module may be projected to the corre- 
sponding global location module by the forward kinematics 
chain LA LO GO GL. Meanwhile, sensory information 
from the global orientation may also be projected into GL by 
GO GL. These two information sources remain independent 
of each other but are now represented in a common module and 
can thus be directly compared. 

The second approach is used when information is fused across 
modules (cf. section 2.6). In this case, the information is projected 
across the modules of nMMF by alternating between local projec- 
tion and information fusion steps. For example, the LA informa- 
tion is projected to LO, where the result is fused with the LO input. 
The fused result is then projected further to GO, where the result 
is fused again, and so on. This method enables the integration of 
even incomplete information^ and it reduces computation time 
because fewer transformation steps are required. 

2.5. CONFLICT RESOLUTION 

The information, which is exchanged via the specified cross- 
module connections, has a specific certainty to it. This cer- 
tainty is encoded implicitly in the neural population codes 
in each module. Sensory signals are encoded in a population 
code by making assumptions about the noise in the signal, 
typically using a measurement model (Thrun et al., 2005). 
However, those assumptions can be violated by, for exam- 
ple, sudden occurrences of systematic sensor errors, unac- 
quainted environmental conditions, or changes in the body 
schema due to growth or injury. To be able to account for 
such potentially unknown signal disturbances, nMMF estimates 
plausibilities for each signal. If a signal has low plausibility, 
it is mistrusted and its information content is consequently 
decreased. 

Because the true state of the body is unknown, nMMF 
estimates signal plausibilities by comparing different, redun- 
dant information sources. The modular encoding of the body 
in nMMF is highly suitable for conducting such comparisons. 
Given several redundant distributions about a body state, a 
failing distribution can be detected when it systematically and 
strongly differs from the complementary, redundant sources of 
information. 

2.5.1. Acquisition of plausibilities 

Let mu be a measure of how well two sources (or dis- 
tributions) 1 and 2 match each other. Zhang and Eggert 
(2009) provide an overview of different potential mea- 
sures for mi2. In nMMF, we use the scalar product as a 
matching measure. Given any neural module i, in which 



^Incomplete information: If e.g., a location input GL is transformed 
into the global orientation module GO, the result specifies only 
one subvector in the direction of the arm, while the other, per- 
pendicular subvector remains unspecified. The second approach 
can then easily fuse a complete GO input onto this incomplete 
information. 
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two PMFs (1 and 2) are encoded, their relative match is 
determined by: 



" IkUxOII-lk^xOll 

E/<?i,;-<?2,; 



(20) 



where the dot • in the first line's numerator is the inner product of 
the two fijnctions q\ (x^) and ^2 (x^)- The measure is sym- 
metric, i.e., = m\^. Thus, if one source has an offset, the 
matching measure can not determine which of the two sources 
has that offset. This can be solved by comparing multiple pair 
matches given at least three redundant sources of information. 

To identify faulty sensory information, nMMF computes a 
plausibility value for each information source i by compar- 
ing it to multiple other redundant information sources The 
most direct comparison is done by determining the mean of the 
matches of channel i with all other channels j, whose information 
was transferred to module i: 



N - 1 



E 



(21) 



The measure may be termed an absolute plausibility measure of 
information source i. To obtain the final plausibility value, the 
relative matching quality is determined by dividing (m^) by the 
highest absolute plausibility measure (m^)* of all related sources: 



(22) 



The whole process is illustrated in Figure 5. In the illustration, 
sensor S\ is assumed to have a systematic error. As the sensor is 



module 1 




arithmetic mean 

V 

T 

1 1 

m 

high plausibility 



module 2 



arithmetic mean 

Normalization 
I2 

m 

high plausibility 



module 3 



31 'J^32j^3. 



arithmetic mean 

T 



by Maximum 



4.3 

m 

high plausibility 



module 4 
failing sensor 

s: 



arithmetic mean 



m 



m 

low plausibility 



FIGURE 5 I Matches mL for pairs of two sources are obtained, then an 
arithmetic mean over all /yields (^m'^ . Finally, a normalization by the 
nnaxinnunn of all (j^^ yields the final plausibility m'. 



always included for comparisons in its own module m^, but only 
once in each other module, the arithmetic mean of its match- 
ing value is lower than that of the others. In our experience, 
this approach of comparing pairs of information sources is more 
robust than, for example, comparing one sensor to the combined 
information of all other sensors. 

In summary, if a channel i is in accordance with most of the 
other channels, the plausibility estimate will be relatively high. 
In contrast, if a specific channel i systematically deviates from all 
other channels, its plausibility estimate will be relatively low. 

2.5.2. Usage of plausibilities 

To incorporate the plausibility estimates into the sensor fusion 
process, the contribution of each information source i is weighted 
by its plausibility estimate m\ This is done by Equation (12), 
where the exponent needs to depend on the plausibility m\ 
Boundary constraints are (0) = 0, (1) = 1 and the mapping 
should strictly increase monotonically. We simply set = m\ 
which meets these constraints. 

2.6. INTERACTIVE INFORMATION FLOW 

With all options for information fusion at hand, we can finally 
specify the iterative information flow in nMMF. nMMF main- 
tains an arm state estimate over time by executing four processing 
steps in each time step: a prediction step (A), a sensor fusion step 
(B), an update step (C), and a crosstalk step (D) (cf. Figure 6). 
The prediction step includes the impact of the movement on the 
estimates. The sensor fusion step first increases the dispersion 
of those sensory distributions that badly match other sensors. 
After that, the modified sensory distributions are fused. The next 
step integrates the sensor fusion result into the estimate of the 
body state. The last step enforces synchronization between the 
individual modules of the body state. 

2.6.1. Prediction step 

In order to be able to use the information from previous time 
steps, the impact of any movement of the arm on the state esti- 
mates (x) is predicted. First, the arm movement Ay and motor 
noise PAy are projected from motor space to all nMMF modules 
by linear approximations, resulting in Ay^ and P^^^. The involved 
Jacobians can be found in Ehrenfeld and Butz (2013). 

Second, the impact of the movement is predicted by convolv- 
ing the probability distribution of the last time step cj^^_m_i (x^) 

with the Gaussian AT ^Ay\ P^^^^ This convolution can be 

understood as a translation of q\_m_i (x^) along the vector Ay^ 

and a blurring with the covariance P^^^. Thus the activity 0^^ of 
some source neuron n in module i flows to all target neurons / 
in the same module. The consequent a priori activity of target 
neuron / after movement but before any sensor consideration can 
be determined by: 



n 

y,N(x^ + Ay,P';,y)(x,) 
E,*VpN(xj, + Ay',P';,^)(xp)^ 



(23) 
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FIGURE 6 1 Data flow for one limb: for simplicity, the inter-limb 
dependencies are not shown. First, the forward nnodel predicts the 
state estimate after the nnovennent (A). Second, the nneasurennents 
are transformed from all modality frames to all other frames (dashed 
lines), where their respective qualities are calculated (B.I). Third, 
copies of the original measurements are fused weighted with both 

where the derivation is specified in the Appendix, cf. 
Equation (31). The equation sums up the activities fi-om all 
source neurons n, where N is the Gaussian, which does the 
translation and blurring. The normalization in the denominator 
ensures that the activity that flows fi-om each source neuron n is 
preserved. 

2.6.2. Multi-sensor fusion 

During multi-sensor fijsion, conflicting information content 
is reduced by deriving sensory plausibilities for each mod- 
ule (Equation 22) and modifying the sensory inputs using 
(Equation 12). Second, the modified distributions are projected 
across modules (Equation 19) in order to provide each module 
with all the sensory input. During this projection, chains of 
transformation steps accumulate information from more and 
more modules along the way. Finally, in each module f, the 
underlying distribution is fused with the outputs from all 
three chains (forward, inverse, and distal-to-proximal). With 
Equation (11) the fusion is: 

i, fused 

'i,t = 

V^^4,.4,|for-5jjinv5jjdis ^^^^ 



the quality and the quantity of their information (B.2). These fused 
measurements are then integrated in their respective modality frame 
(C). Lastly, the crosstalk shifts all state estimates toward all other 
estimates, synchronizing them (D). (A-D) are then repeated for other 
limbs and other time steps. Modified based on Ehrenfeld and Butz 
(2012, 2013). 

where the notation |xyz is used to indicate the particular sen- 
sory information source that is projected into module i and ^ 
denotes neuron Fs share of this information^. The denominator 
normalizes the result. 

2.6.3. Sensor integration 

After sensor fusion, the fused sensor distributions 5^'^^^^*^ 
(Equation 24) are fused again, but this time with the a priori state 
estimate distributions q\ ^|^_^ resulting from the prediction step 
(Equation 23). The resulting posterior distribution before the 
final crosstalk step (denoted by ~) thus equates to: 

y-l^i _ fused 

" V-l'.i ' i,fiised • ^^^^ 

Z^t^i* ^l*,t\t-i-'l*,t 

2.6.4. Multi-body state fusion 

Finally, the module interaction in nMMF is applied to ensure 
that the state estimates stay consistent across the modules. This 
is done the same way as in multi- sensor fusion, except that after- 
wards the resulting distributions are modified such that each one 
has the same entropy as it had before (using Equations 12-14). 



^While q denotes the probability mass of a body state estimate, s denotes the 
probability mass of a neurons response to sensory input. 
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Thus, during multi-body state fusion, information is first erro- 
neously gained, and then corrected for by artificial information 
loss. The crosstalk step essentially shifts the means and shapes 
of each distribution toward other modules, ensuring consistency 
over modules. It does so without changing the distribution width. 
As a result, we have determined the final posterior distribution 
encoded by the probability masses in all neurons / for all modules 
i, denoted by ^|^. 

This step concludes the iterative information processing in 
nMMF, which continuously cycles over these processing (cf. 
Figure 6) steps over time. In the following, we validate the func- 
tionalities and capabilities of nMMF. 

3. RESULTS 

To test if nMMF is capable of maintaining a coherent body 
state estimate, we evaluated nMMF in a simple arm model 
setup, in which a simulated sensor failure occurs temporarily. We 
then analyzed whether the sensor failure can be detected (sec- 
tion 3.2); whether the sensor failure can be compensated for 
(section 3.3); how the available, partially conflicting information 
is propagated across modality frames (section 3.4); and if the 
distal-to-proximal mappings improve nMMF's state estimation 
(section 3.5). 

3.1. ARM SETUP 

To keep it simple, we use a minimally complex arm, which 
still shows all essential characteristics (i.e., modules that differ 
with respect to modalities, frames of reference and limbs, and 
cross-module interactions as in section 2.6). Specifically, a simu- 
lated planar arm with two limbs is used. The arm is controlled 
by a kinematic simulator, disregarding angular momentum or 
gravity. The simulator executes noisy movements with mean 
zero in the (x,y) -plane. The motor noise in the angular modules is 

<^moVement = OTmovement = (O 0 0.1 rad)^ . (26) 

Each limb has one degree of freedom and a length equal to 1. 
Results are averaged over 200 runs. In each run, the arm is ini- 
tially set to a new random position, while the state estimates start 
with uniform distributions (i.e., no knowledge). 

3. 1. 1. Distribution of neurons 

Both neurons and mappings are built once before starting all 200 
runs. The angles x^^^^ and x^^^ can take on values in the interval 
(— 7T, 7t) on the z-axis. The direction parts of the global (local) 
orientation x^^i (x^^i) and x^^^ (x^^^), as well as the location 
of the elbow, are on the unit circle. Thus, the populations in the 
modules LAi , LOi , GOi , GLi , LA2, LO2, and GO2 all need to cover 
lines with the length 27t. Only the wrist location deviates from 
this: it must cover a whole disk with radius 2. 

Two hundred Neurons are sampled in each of the former 
modules. Thus the average Euclidean distance between two neigh- 
boring neurons equals to 

= — ^ 0.031 (27) 



The minimum allowed distance between two neurons (cf. section 
2.2.1) is set to d^nin = 0.7 • d^yg. In order to achieve the same aver- 
age distance in GL2, the number N^^^ of neurons which need to 
be sampled is defined by 

/ 7ir^ In 
i 1^^ 200' ^^^^ 

The GL2 neurons are distributed on a disc with radius 
r = 2 + 3a^^p = 2.09. The summand 2 accounts for the two limb 

lengths from shoulder to wrist, while 3a^^p (cf. section 3.1.2) 
guarantees that some neurons have receptive fields outside but 
close to the arm's reach. This slightly enlarged neural coverage 
avoids that boundary effects distort a probability distribution. 
The enforced equality (Equation 28) yields N^^^ = 14.0 • 10^ 
neurons. 

3.1.2. Mappings 

We chose the standard deviation for the mapping s spreading 
(cf. Equation 16) so that it is equal to the average neuron dis- 
tance, i.e., aj^^p = dl^^ ^ 0.031. The mappings spread radially, 

i.e., R^ap = diag (aMap)? where diag refers to a diagonal matrix. 
We discarded any mappings that fall outside a 3aMap -range. 

3.1.3. Tracldng of information 

In order to track the information influence stemming from one 
module (here GL2), we (1) introduced an offset to GL2 and (2) set 
its noise very low when compared to the other modules. The offset 
is introduced for two reasons: to distinguish the information that 
originates in GL2 from all other information, and to observe how 
nMMF reacts to the sudden failure of a sensor. The offset has a 
magnitude of 0.5 limb length. It is switched on at time t = 4 and 
switched off again at t = 7. The offset is in a counterclockwise 
direction (i.e., from the arm's perspective, the offset is to the left). 
GL2's noise is low compared to other modules, in order to increase 
GL2's impact. We chose radial Gaussians for the sensor noise: 

{ 0.05 limb length if i = GL2 

o' = \ ^ , (29) 

I 0.5 (in rad, limb length, . . . ) otherwise 

where a is the standard deviation. 

Evaluating nMMF when conflict resolution is applied allows 
us to determine whether the sensor failure can be detected and 
how well nMMF compensates for it. When conflict resolution is 
turned off, the setup shows how information starting in GL2 is 
generally propagated across modalities, frames of reference, and 
limbs. 

3.2. DETECTION OF SENSOR FAILURE 

A sensor failure is modeled by the GL2 -sensor offset dur- 
ing the interval te [4,6]. By comparing all sensors, nMMF 
autonomously infers plausibility measures (Equation 22), which 
are displayed in Figure 7. 

Even outside the offset-interval, GL2 (top right) shows a low 
plausibility m as compared to other modules. This is because, 
in general, three aspects characterize a distribution: its mean, its 
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shape, and its dispersion. However, deciding which of these char- 
acteristics should be tested by a matching-measure m depends 
on the appHcation. For instance. Equation (22) compares all 
three characteristics. As G/2's receptive field (Equation 29) is nar- 
rower than all other receptive fields, its dispersion is lower, and 
^Gi2 mainly detects the different dispersions, while it might 
be more interesting to instead detect systematic errors of the 
mean. Thus, for this application, a dispersion-independent mea- 
sure (Ehrenfeld and Butz, 2012, 2013) might be more appropriate. 
This would yield much higher measures trfi^^ than shown in 
Figure 7, top-right. 

Nevertheless, the measure is still able to detect sensor failure: 
while the offset is present {t e [4, 6]), the plausibility measure 
drops in the setup with offset (red), as compared to the setup 
without offset (yellow) (Figure 7, top-right). 

3.3. COMPENSATION OF SENSOR FAILURE 

Plausibilities were introduced as a measure of quality of an infor- 
mation source. If all sources provide correct data, plausibilities 
introduce a random change on otherwise Bayesian fusion. Such 
a change can only worsen the state estimate. The results confirm 
this: With plausibilities switched on, state estimates get worse (cf. 
red vs. yellow, blue vs. green in Figure 8). If, however, a sensory 
source is conflicting the others (red and yellow in the interval 



t G [4, 6]), plausibilities can suppress the influence of the false 
sensor information and improve the overall state estimate (red 
vs. yellow in Figure 8). This improvement is even visible under 
strong noise (red vs. yellow in Figure 8). Again, a dispersion- 
independent measure (Ehrenfeld and Butz, 2012, 2013) could 
improve the performance. 

3.4. PROPAGATION OF INFORMATION ACROSS MODALITIES, FRAMES 
OF REFERENCE AND LIMBS 

The setup without conflict resolution (Figure 8, yellow and 
green) shows how information is propagated across modality 
frames and limbs in general. The yellow peak, which starts in GL2 
(top right), is successfully propagated to all other modality frames 
(from top to bottom) and to the next proximal limb (from right to 
left). Shown is the estimation error (Euclidean distance between 
the real arm state and the estimated arm state). 

3.5. PERFORMANCE IMPROVEMENT DUE TO DISTAL-TO-PROXIMAL 
MAPPINGS 

In order to see if distal-to-proximal mappings improve or worsen 
the state estimation, two setups, one with mappings and one 
without are compared. Figure 9 shows that the proximal limb's 
state estimate improves (yellow vs. blue, red vs. purple) because 
additional information flows to it from the distal limb. A 
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FIGURE 7 I Sensor failure is detected: in the GL2 module, where the 
sensor offset is introduced in time steps t € [4,6], the plausibility 
drops. Error bars are standard errors. 
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FIGURE 8 I An offset is propagated from GL2 to other modality frames 
and toward the upper arm (dashed yellow). The usage of plausibilities 
reduces the offset's influence (the solid red curve is lower than the dashed 
yellow curve). 
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FIGURE 9 I (1) Distal-to-proximal mappings improve the state estimate 
(dashed yellow is lower than dash-dotted cyan, solid red is lower than 
dotted magenta). (2) Plausibilities worsen the state estimate if no 
failures exist and improve the state estimate if failures exist (red vs. 
yellow, magenta vs. cyan). Both effects (1) and (2) are found— though 
weaker— in other modules (not shown). 



slight improvement can even be seen in the distal limb. This 
is the case because the distal limb profits fi-om more accu- 
rate forward and inverse kinematic estimates in the proximal 
limb. 

4. DISCUSSION 

We introduced the neurally- encoded modular modality fi-ame 
(nMMF) model, which maintains a consistent and robust but 
also highly distributed body state estimate over time. As in 
the previously published Gaussian MMF model (Ehrenfeld and 
Butz, 2011, 2012, 2013), nMMF represents the body (an arm 
in the current implementation) modularized into body parts 
and sensor-respective frames of reference. Local, body-state- 
dependent mappings allow for continuous interactions between 
modules, ensuring consistency. Bayesian information fusion prin- 
ciples are applied to fuse sensory information in the respective 
modules, to compare redundant information across modules, 
and to adjust the modular body state estimate for main- 
taining estimation consistency. Forward models are used to 
anticipate the sensory consequences of own movements and 
thus to fuse the consequent sensory information even more 
effectively. 



In contrast to the MMF model, we showed that the same 
principles can be realized by means of a neural implementation, 
adding to the plausibility of the model. To succeed, popula- 
tion encodings principles of state estimates had to be employed. 
To establish a population code in one nMMF module, arm 
states were sampled randomly. To establish the neural mappings 
between the population codes, weight matrices were set based on 
the distances of the connected neurons, where the distances were 
currently determined by an informed kinematic model of the 
arm. To determine plausibility values, we used the scalar product 
to compare two neurally- encoded distributions. To avoid over- 
confidences in body states and to effectively realize information 
fusion, we normalized the resulting distributions maintaining 
respective Shannon entropies in the neural encodings. 

In further contrast to the MMF model, nMMF also includes 
information exchanges from distal to proximal limbs and joints. 
This addition enables further- reaching information exchange. For 
example, information about the hand location can also influence 
estimates of the lower and upper arm, which was not the case in 
the MMF model (Ehrenfeld and Butz, 2013). 

The evaluations confirmed that information from the wrist 
location influenced the whole arm estimate. First, we showed 
that due to the addition of the distal-to-proximal mappings, the 
location of the elbow or angles in the shoulder were adjusted by 
nMMF to generate an overall representation that is more consis- 
tent with the wrist estimate. We also showed that the additional 
mappings improve the state estimate due to the additional infor- 
mation exchange. Second, we showed that a systematic sensor 
error can be detected with the neural encoding. Third, although 
the inclusion of plausibilities slightly decreases the quality of 
the state estimate when all information sources are valid, if a 
sufficiently strong systematic error occurs in a sensor then the 
plausibility estimate can block this inconsistent information. Such 
sensor errors can be compared with situations in which visual 
information about the location of the hand is inaccurate, as is 
the case in the rubber hand illusion, thus leading to a misjudg- 
ment of the hand's location. The distal-to-proximal mappings 
in nMMF suggest, in addition to a misplacement of the hand, 
that the internal estimates of the elbow angles and lower arm 
orientations should be affected by the illusion. 

4.1. RELATED MODELS 

The original motivation to develop the nMMF model came from 
SURE_REACH (Butz et al., 2007), a neural, sensorimotor redun- 
dancy resolving architecture, which models human arm reaching. 
SURE_REACH and the strongly related posture-based motion 
planning approaches (Rosenbaum et al., 2001; Vaughan et al., 
2006) focused on flexible goal reaching capabilities and on antic- 
ipatory behavior capabilities, such as modeling the end state 
comfort effect (Rosenbaum et al, 1990). The current state of the 
body, although incorporated during action decision making, was 
not explicitly represented. In contrast, nMMF primarily focuses 
on the probabilistic, distributed representation of the body and 
effective information exchange. However, we believe that the 
nMMF model is ready to be combined with goal- oriented behav- 
ioral decision making, planning, and control routines. Moreover, 
while the SURE_REACH model was also implemented by neural 
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grids, it represented the angular space of the arm in one module. 
Such a representation, however, is unfeasible for a seven degree of 
freedom, humanoid arm. nMMF's modularizations yield spatial 
encodings that are maximally three dimensional. Thus, nMMF is 
applicable to a seven degree of freedom arm. In particular, while 
SURE_REACH needs 0(x^) neurons to cover the angular space 
of a humanoid arm with a density of l/x neurons per dimen- 
sion, nMMF only needs 0(3x^) neurons to encode a comparable 
density. 

The locality and modularity of nMMF relate the model 
to the mean of multiple computations (MMC) model (Cruse 
and Steinkiihler, 1993; Schilling, 2011). However, nMMF 
additionally provides a probabilistic state representation, 
rigorous Bayesian-based information exchange, and plausibility- 
enhanced sensory information integration mechanisms. While 
the MMC model focuses on motor control, the nMMF model 
focuses on an effective, probabilistic body state represen- 
tation. Nonetheless, the similarity to MMC suggests that 
similar motor control routines are implementable on a neural 
level in nMMF. Moreover, the fact that distributed, multi- 
sensory bodily representations serve well for goal-directed 
motor control (Andersen and Buneo, 2002) suggests that 
nMMF should be extended with adaptive motor control 
capabilities. 

Various models use population codes for encoding proba- 
bility distributions and exchange information in a comparable 
Bayesian fashion (Deneve and Pouget, 2004; Knill and Pouget, 
2004; Doya et al., 2007). Information exchange across modali- 
ties and frames of reference take place in the brain. Gain fields 
are good candidates for realizing frame-of-reference conversions 
neurally (Andersen et al., 1985; Salinas and Abbott, 1995; Hwang 
et al, 2003; Deneve and Pouget, 2004). In the current nMMF 
implementation we used fully connected, direct transformations, 
which will need to be adjusted to gain-field transformations 
in order to map two three dimensional spaces into a third 
space. Nonetheless, in contrast to the related models, nMMF 
realizes a fully modularized, distributed probabilistic arm rep- 
resentation, which, to the best of our knowledge, has not been 
accomplished before. For example, Deneve and Pouget (2004) 
reviewed a multimodal gain field model that exchanged audi- 
tory, visual, and eye position information, enforcing consistency 
via population encodings. While nMMF has not considered audi- 
tory information so far, it goes beyond previous models in that 
it also incorporates a kinematic chain, relating body parts to 
each other along the chain. Thus, besides exchanging informa- 
tion across different frames of references, nMMF also exchanges 
information from distal-to-proximal body parts and vice 
versa. 

In sum, nMMF focuses on estimating the own body state, 
incorporating multiple sources of information across sensory 
modalities and their respective frames of reference, as well 
as across neighboring body parts. While flexible goal-oriented 
behavior cannot be generated by nMMF at this point, the 
relations to the MMC model, the SURE_REACH model, and 
the posture-based motion planning theory suggest that behav- 
ioral decision making, planning, and control techniques can be 
incorporated. 



4.2. FUTURE WORK 

Although the plausibility measure used in this work is generally 
well- suited, our previous work showed that a more rigorous nor- 
malization can yield very little information loss but the same gain 
in robustness when plausibilities are applied (Ehrenfeld and Butz, 
2012, 2013). A similar normalization in the neural implementa- 
tion seems to be possible only by means of heuristics, lacking the 
computational rigor. We are currently investigating alternatives. 

In the current nMMF implementation several choices had to 
be made about which information should be exchanged, how 
plausibilities should be computed, and which reference frames 
should be represented. Additional frames-of-reference could be 
represented, such as a local location frame. Synergistic body 
spaces may also be represented, potentially accounting for the 
synergistic properties of the human body, the muscle arrange- 
ments, and the neural control networks involved (Latash, 2008). 
Also, plausibilities may be determined by considering the internal 
state estimations in addition to the redundant sensory infor- 
mation sources. Finally, the transformations between limbs and 
frames-of-reference may also be endowed with uncertainties. In 
this way, the body model itself would become adjustable, poten- 
tially accounting for illusions such as the Pinocchio illusion 
(Lackner, 1988), where a body part (e.g., the nose) elongates 
phenomenally. 

Due to its modularity and focus on bodily representations, we 
believe that nMMF can be easily integrated into a layered con- 
trol architecture. In such an architecture, other layers may encode 
extended bodily motion primitives, plan the desired kinematics 
of bodily motions, or control the dynamics of the body. In partic- 
ular, extended motion primitives may be incorporated in order 
to execute a motion sequence, potentially selectively with any 
limb or joint currently available, similar to us being able to push 
down a door handle by means of our hands but also potentially 
with one of our elbows. Meanwhile, kinematic planning mecha- 
nisms may utilize the nMMF representation to generate motion 
plans online. Finally, lower-level dynamic control layers may be 
included. 

5. CONCLUSION 

In conclusion, this paper has shown that a distributed, probabilis- 
tic bodily representation can be encoded by modularized neural 
population codes based on Bayesian principles. The presented 
nMMF architecture is able to mimic the capability of humans to 
integrate different sources of information about the body on the 
fly, weighted by the respective information content. Bodily illu- 
sions can also be mimicked. Besides the more rigorous modeling 
of human data with nMMF beyond qualitative comparisons, we 
believe that nMMF should be embedded in a layered representa- 
tion and adaptive control architecture in order to generate flexible 
and adaptive goal-oriented behavior. 
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A APPENDIX 

A.1 PSEUDOINVERSE AND ACTIVATION OF A GAUSSIAN 

A multivariate Gaussian with mean |x and Covariance-matrix P is 
given by: 



N([i, P) (x) ^ 



1 



(27r)MP| 



where k is the number of dimensions in the manifold. 

In nMMF, three cases occur where activity is spread 
over neighboring neurons: when sensory inputs are encoded 
(Equation 6), when neural activity is propagated to other mod- 
ules (Equation 16), and when the body estimate is updated with 
the movement (Equation 23). If the involved tuning functions 
are Gaussian, the activity mass spreads to all individual neurons / 
according to 



qi([i,F)=f- 



VlN([i,F) (X/) 
j:i.Vi* N(\i,P) (X/*)' 



(31) 



where |X is the new mean, P the tuning functions covariance, 
and / the activity mass, which is spread. For instance, if a sensory 
input is activated, |X is equal to the sensory reading and / is 1 . If, 
on the other hand, the activity of a single neuron x„ is updated 
with a movement Ax, |x is equal to x„ + Ax, and / is equal to the 
neuron's probability mass 

If no inverse P~^ exists, it is approximated with the pseudo in- 
verse P^. The pseudo inverse is computed via a singular value 
decomposition, which facto rizes the (real) m x m covariance P 
into 



P = USV^, 



(32) 



where U and V are unitary and E diagonal. U and V can be under- 
stood as rotation matrices while E is responsible for the scaling. 
Then the pseudoinverse P^ is 



as the probability distribution not depending on that element. 
Consistent with that interpretation is E + : the corresponding 
diagonal element remains zero and deviations (x/ — |x) of the 
mean in the direction of that element are multiplied with zero, 
i.e., they do not lower the result of the Gaussian (Equation 31). 
Unfortunately, this is a singularity. For diagonal elements close 
but unequal to zero, and consequently P+ explode. This 
occurs especially if a sub-manifold in a higher dimensional space 
needs to be activated (e.g., a sphere of elbow positions). 

Thresholds are introduced to prevent discretization errors 
and small numerical errors from destabilizing the model. Matrix 
elements (P~^).. larger than the threshold 10^^ are set to zero: 

(r)„ = &(lO''-{p-').){p-%., (34) 

where © is the heavyside function. Following, the distance vectors 
a; and P; are introduced as x/ — (jl and F (x; — \i) and bound by 
the threshold 10"'°: 



(aOi = ©([x,-jjL]i-10-'°)-[x;-(jL]i 



(35) 



(P,)i = © ([r (X; - (JL)]i - 10-1°) . _ (3g) 

Thus, Equation (31) becomes 



Vie 



(37) 



A.2 VORONOI CELL AND VORONOI VOLUME 

When N neurons are spread over a sample space ^ at positions 
XI, I e (l.-AT), the Voronoi-cell Ri of a neuron / is defined as the 
set of all points x that are closer to the neuron position x/ than to 
any other neurons, i.e.. 



Rl = {x|Vm : ||x-x/|| < ||x-x„ 



(38) 



P+ = VE+U 



(33) 



where the pseudoinverse of the diagonal matrix E is obtained 
by taking the reciprocal of every non-zero element. If a diag- 
onal element of E is equal to zero, this has to be interpreted 



where 1 1 • 1 1 is the Euclidean norm. Intuitively, it is the subspace to 
which the neuron responds stronger than any other neurons. The 
Voronoi volume is defined as the volume of that cell. As only rel- 
ative values are required, any normalization of the Volumes V/ is 
arbitrary. 
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